Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks

نویسندگان

  • Srikanth Jagabathula
  • Lakshminarayanan Subramanian
  • Ashwin Venkataraman
چکیده

In this paper, we study the problem of aggregating noisy responses from crowd workers to infer the unknown true labels of binary tasks. Unlike most prior work which has examined this problem under the probabilistic worker paradigm, we consider a much broader class of adversarial workers with no specific assumptions on their labeling strategy. Our key contribution is the design of a computationally efficient reputation algorithm to identify and filter out such adversarial workers in crowdsourcing systems, given only the labels provided by the workers. Our algorithm uses the concept of optimal semi-matchings in conjunction with worker penalties based on label disagreements, to detect outlier worker labeling patterns. We prove that our algorithm can successfully identify low reliability workers, workers adopting deterministic strategies; and is robust to manipulation by worstcase sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Finally, we show that our reputation algorithm can significantly improve the accuracy of existing label aggregation algorithms in real-world crowdsourcing datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Integration of Crowdsourced Labels Using Workers' Self-reported Confidence Scores

We have developed a method for using confidence scores to integrate labels provided by crowdsourcing workers. Although confidence scores can be useful information for estimating the quality of the provided labels, a way to effectively incorporate them into the integration process has not been established. Moreover, some workers are overconfident about the quality of their labels while others ar...

متن کامل

Crowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach

Crowd-selection is essential to crowdsourcing applications, since choosing the right workers with particular expertise to carry out specific crowdsourced tasks is extremely important. The central problem is simple but tricky: given a crowdsourced task, who is the right worker to ask? Currently, most existing work has mainly studied the problem of crowd-selection for simple crowdsourced tasks su...

متن کامل

Multi-Objective Crowd Worker Selection in Crowdsourced Testing

Crowdsourced testing is an emerging trend in software testing, which relies on crowd workers to accomplish test tasks. Typically, a crowdsourced testing task aims to detect as many bugs as possible within a limited budget. For a specific test task, not all crowd workers are qualified to perform it, and different test tasks require crowd workers to have different experiences, domain knowledge, e...

متن کامل

Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web

Within the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one hand, and accounting for the behavior of workers on the other hand, can help in designing tasks efficiently. To help understand the intricacies of...

متن کامل

An Information Theoretic Approach to Managing Multiple Decision Makers

Citizen science and human computation involves working with multiple, untrusted decision makers, whose performance depends on training, rewards, ability and interest. We first present methods for screening workers and selecting informative objects to label. We then demonstrate Bayesian Classifier Combination as an effective method for classifying documents using unreliable crowdsourced labels. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2017